Layer Normalization

نویسندگان

  • Jimmy Ba
  • Ryan Kiros
  • Geoffrey E. Hinton
چکیده

Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called batch normalization uses the distribution of the summed input to a neuron over a mini-batch of training cases to compute a mean and variance which are then used to normalize the summed input to that neuron on each training case. This significantly reduces the training time in feedforward neural networks. However, the effect of batch normalization is dependent on the mini-batch size and it is not obvious how to apply it to recurrent neural networks. In this paper, we transpose batch normalization into layer normalization by computing the mean and variance used for normalization from all of the summed inputs to the neurons in a layer on a single training case. Like batch normalization, we also give each neuron its own adaptive bias and gain which are applied after the normalization but before the non-linearity. Unlike batch normalization, layer normalization performs exactly the same computation at training and test times. It is also straightforward to apply to recurrent neural networks by computing the normalization statistics separately at each time step. Layer normalization is very effective at stabilizing the hidden state dynamics in recurrent networks. Empirically, we show that layer normalization can substantially reduce the training time compared with previously published techniques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extending Network Normalization Schemes

Normalization techniques have only recently begun to be exploited in supervised learning tasks. Batch normalization exploits mini-batch statistics to normalize the activations. This was shown to speed up training and result in better models. However its success has been very limited when dealing with recurrent neural networks. On the other hand, layer normalization normalizes the activations ac...

متن کامل

Normalizing the Normalizers: Comparing and Extending Network Normalization Schemes

Normalization techniques have only recently begun to be exploited in supervised learning tasks. Batch normalization exploits mini-batch statistics to normalize the activations. This was shown to speed up training and result in better models. However its success has been very limited when dealing with recurrent neural networks. On the other hand, layer normalization normalizes the activations ac...

متن کامل

A New Normalization for Mfcc: Multi Layer Strategy and Recursive Progress

One main obstacle in speech recognition is what said “robustness”. This paper focus on one popular idea in antagonizing speech system vulnerability-channel normalization, and presents a new normalization algorithmMulti-Layer Channel Normalization (MLCN), which exploits the recursive compensation progress in two domainsspectral domain and cepstral domainto depress different noises, so that the m...

متن کامل

Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition

Layer normalization is a recently introduced technique for normalizing the activities of neurons in deep neural networks to improve the training speed and stability. In this paper, we introduce a new layer normalization technique called Dynamic Layer Normalization (DLN) for adaptive neural acoustic modeling in speech recognition. By dynamically generating the scaling and shifting parameters in ...

متن کامل

Comparison of retinal layer intensity profiles from different OCT devices.

BACKGROUND AND OBJECTIVE The purpose of this study is to use automated multiple retinal layer segmentation to compare retinal layer intensity profiles between different spectral-domain optical coherence tomography (SD-OCT) devices with and without normalization. PATIENTS AND METHODS A graph-based multistage segmentation approach was used to identify 11 boundaries in horizontal SD-OCT B-scans ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1607.06450  شماره 

صفحات  -

تاریخ انتشار 2016